fix(cypher,store): prevent crashes from buffer overflow, OOM, and NULL stmts by map588 · Pull Request #206 · DeusData/codebase-memory-mcp

map588 · 2026-04-05T01:00:46Z

cypher: Add bounds check in lex_string_literal to prevent stack buffer overflow on string literals >4096 bytes
cypher: Add malloc/calloc NULL checks in parse_props, parse_rel_types, parse_in_condition, and parse_case_expr to prevent OOM crashes
store: Add sqlite3_prepare_v2 return code checks at 3 sites in cbm_store_schema_info and collect_pkg_names to prevent NULL stmt dereference on DB corruption

Copilot

Pull request overview

This PR hardens the Cypher lexer/parser and the SQLite-backed store layer against crash scenarios triggered by unusually large inputs, out-of-memory conditions, and SQLite statement preparation failures (e.g., due to DB corruption).

Changes:

Cypher lexer: add a bounds check in lex_string_literal and a regression test for oversized string literals.
Cypher parser: add initial malloc/calloc NULL checks in several parse helpers to reduce OOM crash risk.
Store: add sqlite3_prepare_v2 return-code/NULL-stmt checks at three call sites to avoid NULL sqlite3_stmt* dereferences.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
`src/cypher/cypher.c`	Adds string-literal truncation logic and some initial allocation NULL checks in parsing helpers.
`src/store/store.c`	Adds SQLite prepare checks to avoid dereferencing NULL statements during schema/package collection.
`tests/test_cypher.c`	Adds a regression test to ensure oversized string literals don’t overflow lexer buffers.

Comments suppressed due to low confidence (2)

src/store/store.c:2564

arr is allocated without a NULL check. On OOM this will be NULL and the arr[n]... writes in the row loop will crash; safe_realloc can also return NULL later in the loop. Since this PR is explicitly hardening OOM cases, consider adding allocation checks and returning an error (finalizing stmt first) if allocation fails.

        int cap = ST_INIT_CAP_8;
        int n = 0;
        cbm_label_count_t *arr = malloc(cap * sizeof(cbm_label_count_t));
        while (sqlite3_step(stmt) == SQLITE_ROW) {
            if (n >= cap) {

src/store/store.c:2591

arr is allocated without a NULL check. On OOM this will be NULL and the arr[n]... writes in the row loop will crash; safe_realloc can also return NULL later in the loop. Please add allocation-result checks and handle failure (including finalizing stmt) to fully prevent OOM crashes.

        int cap = ST_INIT_CAP_8;
        int n = 0;
        cbm_type_count_t *arr = malloc(cap * sizeof(cbm_type_count_t));
        while (sqlite3_step(stmt) == SQLITE_ROW) {
            if (n >= cap) {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-05T01:05:53Z

src/cypher/cypher.c

    int blen = 0;
+    const int max_blen = CBM_SZ_4K - 1;
    while (*pos < len && input[*pos] != quote) {
+        if (blen >= max_blen) { (*pos)++; continue; }


The truncation path (if (blen >= max_blen) { (*pos)++; continue; }) skips escape-handling, so once the buffer is full a sequence like \" can be mis-tokenized: the backslash is skipped but the following quote is then seen as the closing delimiter, ending the string early. Consider still consuming escape sequences while truncating (advance past both '\' and the escaped char) so the lexer always finds the real closing quote.

Suggested change

if (blen >= max_blen) { (*pos)++; continue; }

if (blen >= max_blen) {

if (input[*pos] == '\\' && *pos + SKIP_ONE < len) {

*pos += SKIP_ONE;

}

(*pos)++;

continue;

}

Copilot · 2026-04-05T01:05:53Z

src/store/store.c

+        if (sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL) != SQLITE_OK || !stmt) {
+            return CBM_NOT_FOUND;
+        }


If the edge-types sqlite3_prepare_v2 fails, this returns immediately without freeing any node-label data already stored in out from the previous block. Some callers only call cbm_store_schema_free on CBM_STORE_OK (e.g., UI layout3d), so this can leak memory. Consider using a single cleanup path that frees any partially-filled out before returning an error.

Copilot · 2026-04-05T01:05:53Z

src/cypher/cypher.c

    int cap = CYP_INIT_CAP4;
    int n = 0;
    cbm_prop_filter_t *arr = malloc(cap * sizeof(cbm_prop_filter_t));
+    if (!arr) {
+        return CBM_NOT_FOUND;
+    }


This adds an initial malloc NULL-check, but the function still calls safe_realloc and heap_strdup later without checking for NULL. Since safe_realloc returns NULL (and frees the old pointer), an OOM during growth or key/value duplication can still lead to NULL dereferences (e.g., later strcmp on a NULL props[i].value). Consider propagating an error if any allocation/duplication fails and freeing partially-built arr.

Copilot · 2026-04-05T01:05:54Z

src/cypher/cypher.c

    int cap = CYP_INIT_CAP4;
    int n = 0;
    const char **types = malloc(cap * sizeof(const char *));
+    if (!types) {
+        return CBM_NOT_FOUND;
+    }


This adds an initial malloc NULL-check, but types can still become NULL if a later safe_realloc fails (it returns NULL and frees the old buffer), and heap_strdup can return NULL as well. To fully prevent OOM crashes, please check realloc/duplication results, free any already-duplicated strings on failure, and return an error.

Copilot · 2026-04-05T01:05:54Z

src/cypher/cypher.c

    int vcap = CYP_INIT_CAP8;
    int vn = 0;
    const char **vals = malloc(vcap * sizeof(const char *));
+    if (!vals) {
+        free((void *)c->variable);
+        free((void *)c->property);
+        free((void *)c->op);
+        return NULL;


Good to check the initial malloc, but the list-growth path later uses safe_realloc without checking for NULL (it frees the old pointer on failure), and each element is populated via heap_strdup which may return NULL. Either can still produce a NULL dereference later when evaluating IN conditions (e.g., strcmp(actual, c->in_values[i])). Please add checks and clean up partial allocations on failure.

Copilot · 2026-04-05T01:05:54Z

src/cypher/cypher.c

    cbm_case_expr_t *kase = calloc(CBM_ALLOC_ONE, sizeof(cbm_case_expr_t));
+    if (!kase) {
+        return NULL;
+    }
    int bcap = CYP_INIT_CAP4;
    kase->branches = malloc(bcap * sizeof(cbm_case_branch_t));
+    if (!kase->branches) {
+        free(kase);
+        return NULL;
+    }


This checks calloc/malloc for the initial CASE expression allocations, but the branch-growth path later uses safe_realloc without checking for NULL (and safe_realloc frees the old buffer on failure). To avoid NULL dereferences/leaks on OOM, please check the realloc result and free kase/branches before returning an error.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-05T02:36:56Z

src/cypher/cypher.c

        if (n >= cap) {
            cap *= PAIR_LEN;
            arr = safe_realloc(arr, cap * sizeof(cbm_prop_filter_t));
+            if (!arr) {
+                return CBM_NOT_FOUND;
+            }


safe_realloc frees the original buffer on failure (see platform.h), so if this growth realloc fails you return with arr already freed but any previously heap_strdup'd keys/values are leaked (and you can't clean them up anymore). Use a non-destructive realloc pattern (tmp=realloc; on failure, free arr[i].key/value for i<n, then free(arr)) so OOM handling is leak-free.

Copilot · 2026-04-05T02:36:56Z

src/cypher/cypher.c

        if (n >= cap) {
            cap *= PAIR_LEN;
            types = safe_realloc(types, cap * sizeof(const char *));
+            if (!types) {
+                return CBM_NOT_FOUND;
+            }


Same safe_realloc issue as in parse_props: on realloc failure the types array is freed, but already-duplicated strings in types[0..n-1] leak. Prefer realloc into a temporary pointer so you can free the existing types[i] strings and the array on failure.

Copilot · 2026-04-05T02:36:56Z

src/cypher/cypher.c

                vcap *= PAIR_LEN;
                vals = safe_realloc(vals, vcap * sizeof(const char *));
+                if (!vals) {
+                    free((void *)c->variable);
+                    free((void *)c->property);
+                    free((void *)c->op);
+                    return NULL;
+                }


On realloc failure, safe_realloc frees the existing vals array, so returning here leaks all heap_strdup'd values already stored in vals[0..vn-1]. Use realloc into a temp pointer (or a realloc helper that does not free on failure) so you can free the accumulated vals[i] strings + vals before returning.

Suggested change

vcap *= PAIR_LEN;

vals = safe_realloc(vals, vcap * sizeof(const char *));

if (!vals) {

free((void *)c->variable);

free((void *)c->property);

free((void *)c->op);

return NULL;

}

const char **new_vals;

int i;

vcap *= PAIR_LEN;

new_vals = realloc((void *)vals, vcap * sizeof(const char *));

if (!new_vals) {

for (i = 0; i < vn; i++) {

free((void *)vals[i]);

}

free((void *)vals);

free((void *)c->variable);

free((void *)c->property);

free((void *)c->op);

return NULL;

}

vals = new_vals;

Copilot · 2026-04-05T02:36:56Z

src/cypher/cypher.c

        if (kase->branch_count >= bcap) {
            bcap *= PAIR_LEN;
            kase->branches = safe_realloc(kase->branches, bcap * sizeof(cbm_case_branch_t));
+            if (!kase->branches) {
+                expr_free(when);
+                free(kase);
+                return NULL;
+            }


On branch growth, safe_realloc frees the existing kase->branches array on failure; returning here then leaks all previously parsed when_expr trees and then_val strings already stored in the old branches (since you no longer have the array to iterate). Switch to a non-destructive realloc pattern and, on failure, call free_case_expr(kase) (or equivalent partial cleanup) before returning.

Copilot · 2026-04-05T02:36:57Z

src/store/store.c

+        if (sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL) != SQLITE_OK || !stmt) {
+            return CBM_NOT_FOUND;
+        }


If sqlite3_prepare_v2 fails it can still leave a non-NULL statement handle in some cases (e.g., partial compile). Returning without finalizing risks leaking stmt. Consider capturing rc, and if stmt is non-NULL always sqlite3_finalize(stmt) before returning (and optionally record the SQLite error).

Copilot · 2026-04-05T02:36:57Z

src/store/store.c

+        if (sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL) != SQLITE_OK || !stmt) {
+            cbm_store_schema_free(out);
+            return CBM_NOT_FOUND;
+        }


Same as above: on prepare failure, ensure any non-NULL stmt is finalized before returning. Right now this path can leak stmt (and you already free out, which is good).

Copilot · 2026-04-05T02:36:57Z

src/store/store.c

                             int max_pkgs) {
    sqlite3_stmt *stmt = NULL;
-    sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL);
+    if (sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL) != SQLITE_OK || !stmt) {


On prepare failure, consider finalizing stmt if it is non-NULL before returning to avoid leaking a partially prepared statement. Also consider recording the underlying SQLite error so callers have some diagnostics when DB corruption triggers this path.

Suggested change

if (sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL) != SQLITE_OK || !stmt) {

int rc = sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL);

if (rc != SQLITE_OK || !stmt) {

if (stmt != NULL) {

sqlite3_finalize(stmt);

}

…L stmts - cypher: Add bounds check in lex_string_literal to prevent stack buffer overflow on string literals >4096 bytes. Escape sequences are always parsed correctly even past the truncation boundary. - cypher: Add malloc/calloc NULL checks in parse_props, parse_rel_types, parse_in_condition, and parse_case_expr. Growth paths use non-destructive realloc (temp pointer) so accumulated elements can be freed on OOM instead of leaking through safe_realloc's free-on-failure semantics. - store: Add sqlite3_prepare_v2 return code checks at 3 sites in cbm_store_schema_info and collect_pkg_names. Partially prepared statements are finalized before returning. Schema function cleans up partially populated output on failure. collect_pkg_names returns CBM_NOT_FOUND (not 0) to distinguish errors from empty results.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-05T03:38:03Z

src/cypher/cypher.c

+            arr = tmp;
+            cap = new_cap;
        }
        arr[n].key = heap_strdup(key->text);
        arr[n].value = heap_strdup(val->text);


parse_props assigns arr[n].key/value from heap_strdup without checking for NULL. If heap_strdup fails, later code like check_inline_props() will call strcmp(props[i].value, ...) and can segfault. After heap_strdup, validate the returned pointers; on failure, free any previously allocated entries (and arr itself) and return an error.

Copilot · 2026-04-05T03:38:04Z

src/cypher/cypher.c

+            types = (const char **)tmp;
+            cap = new_cap;
        }
        types[n++] = heap_strdup(t->text);
    }


parse_rel_types stores relationship type strings via heap_strdup but never checks for NULL. If an allocation fails, rel->types will contain NULL and later strcmp/SQLite binding paths can dereference it. After heap_strdup, check for NULL and, if needed, free any previously duplicated types and return an error.

Copilot · 2026-04-05T03:38:04Z

src/cypher/cypher.c

+                }
+                vals = (const char **)tmp;
+                vcap = new_vcap;
            }
            vals[vn++] = heap_strdup(advance(p)->text);
        } else {


parse_in_list appends list items via heap_strdup without checking for NULL. eval_condition later does strcmp(actual, c->in_values[i]) and will crash if any entry is NULL. Check heap_strdup’s return; on failure, free vals (including already-duplicated strings) and c’s allocated fields, then return NULL with an error set.

Copilot · 2026-04-05T03:38:04Z

src/cypher/cypher.c

+            if (!tmp) {
+                expr_free(when);
+                for (int i = 0; i < kase->branch_count; i++) {
+                    expr_free(kase->branches[i].when_expr);


On realloc failure, the cleanup loop frees only when_expr for existing branches but not branches[i].then_val. That leaks the already-duplicated THEN values (and can be substantial for large queries) right in the OOM path this change is trying to harden. Free then_val for each existing branch in the failure cleanup before freeing branches/kase.

Suggested change

expr_free(kase->branches[i].when_expr);

expr_free(kase->branches[i].when_expr);

free((void *)kase->branches[i].then_val);

Copilot · 2026-04-05T03:38:04Z

src/store/store.c

+        if (stmt) {
+            sqlite3_finalize(stmt);
+        }
+        return CBM_NOT_FOUND;


collect_pkg_names returns CBM_NOT_FOUND on sqlite3_prepare_v2 failure, which is the same numeric value as CBM_STORE_ERR (-1) but semantically inconsistent with other store helpers that return CBM_STORE_ERR and call store_set_error_sqlite(). Consider returning CBM_STORE_ERR (or 0 if you intend “no pkgs”) and setting the store error so downstream callers can diagnose DB corruption.

Suggested change

return CBM_NOT_FOUND;

store_set_error_sqlite(s, "sqlite3_prepare_v2 failed in collect_pkg_names");

return CBM_STORE_ERR;

Copilot AI review requested due to automatic review settings April 5, 2026 01:00

Copilot started reviewing on behalf of map588 April 5, 2026 01:01 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

map588 force-pushed the fix/crash-safety branch from 77923d6 to c58796c Compare April 5, 2026 01:22

map588 requested a review from Copilot April 5, 2026 02:32

Copilot started reviewing on behalf of map588 April 5, 2026 02:33 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

map588 force-pushed the fix/crash-safety branch from c58796c to c8661b9 Compare April 5, 2026 03:07

map588 requested a review from Copilot April 5, 2026 03:33

Copilot started reviewing on behalf of map588 April 5, 2026 03:33 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

-    if (sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL) != SQLITE_OK || !stmt) {
+    int rc = sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL);
+    if (rc != SQLITE_OK || !stmt) {
+        if (stmt != NULL) {
+            sqlite3_finalize(stmt);
+        }

	expr_free(kase->branches[i].when_expr);
	expr_free(kase->branches[i].when_expr);
	free((void *)kase->branches[i].then_val);

	return CBM_NOT_FOUND;
	store_set_error_sqlite(s, "sqlite3_prepare_v2 failed in collect_pkg_names");
	return CBM_STORE_ERR;

Conversation

map588 commented Apr 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants